Rigorous guarantees about the performance of predictive algorithms are necessary in order to ensure their responsible use. Previous work has largely focused on bounding the expected loss of a predictor, but this is not sufficient in many risk-sensitive applications where the distribution of errors is important. In this work, we propose a flexible framework to produce a family of bounds on quantiles of the loss distribution incurred by a predictor. Our method takes advantage of the order statistics of the observed loss values rather than relying on the sample mean alone. We show that a quantile is an informative way of quantifying predictive performance, and that our framework applies to a variety of quantile-based metrics, each targeting important subsets of the data distribution. We analyze the theoretical properties of our proposed method and demonstrate its ability to rigorously control loss quantiles on several real-world datasets.
translated by 谷歌翻译
最近的大规模自然语言处理(NLP)系统对大规模和多样化的语料库使用预先培训的大型语言模型(LLM)。实际上,预训练的模型通过对特定于任务的数据集进行微调来适应各种任务。 LLMS虽然有效,但已被证明可以记住培训数据的实例,从而有可能揭示在预训练期间处理的私人信息。潜在的泄漏可能会进一步传播到LLM经过微调的下游任务。另一方面,保存隐私的算法通常涉及从头开始的重新划痕,这对于LLM来说非常昂贵。在这项工作中,我们提出了一个简单,易于解释的,并且在解码阶段将其轻巧的扰动机制应用于已经训练的模型。我们的扰动机制是模型不可抑制的,可以与任何LLM结合使用。我们提供的理论分析表明,所提出的机制是私人的,实验结果显示了隐私 - 私人权衡权衡。
translated by 谷歌翻译
变异因素之间的相关性在现实数据中普遍存在。机器学习算法可能会受益于利用这种相关性,因为它们可以提高噪声数据的预测性能。然而,通常这种相关性不稳定(例如,它们可能在域,数据集或应用程序之间发生变化),我们希望避免利用它们。解剖学方法旨在学习捕获潜伏子空间变化不同因素的表示。常用方法涉及最小化潜伏子空间之间的相互信息,使得每个潜在的底层属性。但是,当属性相关时,这会失败。我们通过强制执行可用属性上的子空间之间的独立性来解决此问题,这允许我们仅删除不导致的依赖性,这些依赖性是由于训练数据中存在的相关结构。我们通过普发的方法实现这一目标,以最小化关于分类变量的子空间之间的条件互信息(CMI)。我们首先在理论上展示了CMI最小化是对高斯数据线性问题的稳健性解剖的良好目标。然后,我们基于MNIST和Celeba在现实世界数据集上应用我们的方法,并表明它会在相关偏移下产生脱屑和强大的模型,包括弱监督设置。
translated by 谷歌翻译
Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domaininvariant. An important assumption in this area is that the training examples are partitioned into "domains" or "environments". Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds and CivilComments datasets. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.
translated by 谷歌翻译
Deep learning has triggered the current rise of artificial intelligence and is the workhorse of today's machine intelligence. Numerous success stories have rapidly spread all over science, industry and society, but its limitations have only recently come into focus. In this perspective we seek to distil how many of deep learning's problem can be seen as different symptoms of the same underlying problem: shortcut learning. Shortcuts are decision rules that perform well on standard benchmarks but fail to transfer to more challenging testing conditions, such as real-world scenarios. Related issues are known in Comparative Psychology, Education and Linguistics, suggesting that shortcut learning may be a common characteristic of learning systems, biological and artificial alike. Based on these observations, we develop a set of recommendations for model interpretation and benchmarking, highlighting recent advances in machine learning to improve robustness and transferability from the lab to real-world applications. This is the preprint version of an article that has been published by Nature Machine Intelligence
translated by 谷歌翻译
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system's constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational auto-encoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover ground-truth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data.
translated by 谷歌翻译
We propose prototypical networks for the problem of few-shot classification, where a classifier must generalize to new classes not seen in the training set, given only a small number of examples of each new class. Prototypical networks learn a metric space in which classification can be performed by computing distances to prototype representations of each class. Compared to recent approaches for few-shot learning, they reflect a simpler inductive bias that is beneficial in this limited-data regime, and achieve excellent results. We provide an analysis showing that some simple design decisions can yield substantial improvements over recent approaches involving complicated architectural choices and meta-learning. We further extend prototypical networks to zero-shot learning and achieve state-of-theart results on the CU-Birds dataset.
translated by 谷歌翻译
We study characteristics of receptive fields of units in deep convolutional networks. The receptive field size is a crucial issue in many visual tasks, as the output must respond to large enough areas in the image to capture information about large objects. We introduce the notion of an effective receptive field, and show that it both has a Gaussian distribution and only occupies a fraction of the full theoretical receptive field. We analyze the effective receptive field in several architecture designs, and the effect of nonlinear activations, dropout, sub-sampling and skip connections on it. This leads to suggestions for ways to address its tendency to be too small.
translated by 谷歌翻译
Graph-structured data appears frequently in domains including chemistry, natural language semantics, social networks, and knowledge bases. In this work, we study feature learning techniques for graph-structured inputs. Our starting point is previous work on Graph Neural Networks (Scarselli et al., 2009), which we modify to use gated recurrent units and modern optimization techniques and then extend to output sequences. The result is a flexible and broadly useful class of neural network models that has favorable inductive biases relative to purely sequence-based models (e.g., LSTMs) when the problem is graph-structured. We demonstrate the capabilities on some simple AI (bAbI) and graph algorithm learning tasks. We then show it achieves state-of-the-art performance on a problem from program verification, in which subgraphs need to be described as abstract data structures.
translated by 谷歌翻译
Books are a rich source of both fine-grained information, how a character, an object or a scene looks like, as well as high-level semantics, what someone is thinking, feeling and how these states evolve through a story. This paper aims to align books to their movie releases in order to provide rich descriptive explanations for visual content that go semantically far beyond the captions available in current datasets.To align movies and books we exploit a neural sentence embedding that is trained in an unsupervised way from a large corpus of books, as well as a video-text neural embedding for computing similarities between movie clips and sentences in the book. We propose a context-aware CNN to combine information from multiple sources. We demonstrate good quantitative performance for movie/book alignment and show several qualitative examples that showcase the diversity of tasks our model can be used for.
translated by 谷歌翻译